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With the rapidly increasing integration of wind energy into the modern 
energy grid system, wind energy prediction (WPP) is playing an important 
role in the planning and operation of an electrical distribution system. 
However, the time series data of wind energy always has nonlinear and non- 
stationary characteristics, which is still a great challenge to be accurately 
predicted. This paper proposes the intelligent wind power forecast model 
and evaluates to forecast long term, short term and medium term wind 
power. It uses statistical and machine learning approach for finding the best 
model for multiperiod forecasting. The model has been tested on Sotavento 
wind farm historical data, located in Galicia, Spain. The experimental results 
show that random forest has better accuracy than other models for long term, 
short term and medium term forecasting. The power prediction accuracy of 
the proposed model has been evaluated on RMSE, and MAE metrics. The 
proposed model has shown better accuracy for medium term and long term 


RMSE forecast. The accuracy is improved by 72.12% in case of medium term and 
Statistical approach 50.49% in case of long term. 


Wind power forecasting This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 

The intelligent and smart systems in science and technology have increased the comfort level in 
human life. However, the demand generates energy crises. Conventional energy source fossil fuel generates 
pollution [1] because of this renewable energy, such as wind gaining more attention and importance. 
Worldwide, wind power, among other forms of renewable resources such as solar energy, bioenergy, 
hydropower, tidal wave has been considered as one of the sources of power generation growing faster due to 
economical ways of harnessing the kinetic energy of the wind [2]. Wind power is a sustainable and clean 
source of energy which does not lead to any hazards to the environment. Hence, wind power generation is the 
main goal of many countries. Wind power generation [3] is quite an uncertain process because of intermittent 
and chaotic nature of wind. This could lead to huge loss in the energy distribution sector. The accurate 
prediction of wind power from the wind generation farms has become crucial and challenging all the time. 
Electricity generated by windmill changes according to the fluctuation of wind speed and direction. The 
accuracy in prediction of wind power directly relates to profitability and penalty. 

In the future almost 100% or near to that, renewable energy will be the primary source therefore 
load balancing in grid will have to cope with the intermittent nature of wind energy. In this context, more 
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reliable wind energy forecasting techniques should be developed to maintain not only grid stability, but also 
for saving the overall system. A small 1% increase in forecast quality would save US$140 million in the 
United State [4]. Wind power forecasting periods vary in different literature. Mainly it is categorized into 
four spans, very short term, short term, medium term and long term. Table 1 summarises the precise 
classification, temporal range, and application purpose of different horizons based on author reviews [5]. 


Table 1. Time horizon for wind power prediction approaches 


Time horizon Time range Applications 
Very short term Up to 90 minutes ahead Regulation actions, intraday trading, to maintain grid stability, to reduce 
penalties 
Short term Up to 6 hours ahead Planning of load dispatch, for taking decision to increment/decrement load 
Medium term Up to 1 day ahead Security for the next day electricity market 
Long term 1 day to 1 week or more ahead Maintenance planning, optimum operation 


The importance of wind speed to predict wind power, the literature [6] estimated values of wind 
speed by applying appropriate methods, and predicted wind power, this method is called an indirect method. 
Wind power forecasting is done directly in the direct approach, without the need for a previous phase in 
which the wind speed is estimated. Many researchers have been focused on the development of reliable wind 
power forecasting models, various models have been proposed. Models used by researchers [7] are mainly 
classified into physical [8], statistical [9] and hybrid [10] classes. 

The primary part of wind power forecasting is estimating future values of the meteorological 
variables needed at the wind farm level, because wind power is directly related to weather 
conditions.Weather forecasting [11] can be handled by global or regional models with different resolutions. 
This is done using the numerical weather forecast (NWP) [12] model. NWP models are generally computed 
using supercomputers in meteorological departments or research institutes, to deal with larger resolutions and 
better representations of atmospheric processes. These models are based on mathematical calculations that 
represent the state of the atmosphere, including turbulence, pressure, and radiation levels. Navier-Stokes 
equations are frequently employed to describe the movement of viscous liquids in addition to the laws of 
physics. These models are used not only for predicting one particular purpose, but also for various industrial 
and scientific applications. The model not only predicts wind speed, but also atmospheric conditions at a 
particular location and time. The weather variables necessary as input for wind energy forecasts not only 
required wind speed and direction, but also pressure, humidity, and temperature. 

Pearre and Swan [9] describes the relationship between wind speed or power predictions and 
explanatory variables, including historical online measured data and NWP data [13]. General structure of 
statistical models typically uses historical data and NWP data to build models. These approaches are easy to 
model and less expensive. Traditional statistical methods apply time series models to predict future wind 
speed or wind power. On univariate time series analysis, many types of time series models are utilised, such 
as the moving average model (MA), autoregressive model (AR), autoregressive moving average model 
(ARMA) [14], and autoregressive integrated moving average model (ARIMA). If in the moving average 
model, q has an order of zero; it represents an autoregressive model (AR (p)) of order p. For an 
autoregressive model, if p has an order zero then it represents an autoregressive model (AR (p)) of order p. 
ARMA (p, q) is a p-order autoregressive and a q-order moving average model. A generalisation of an ARMA 
model is the ARIMA model. In summary, traditional statistical approaches are mainly used for short-term 
and very short-term forecasting. 

With the rapid development of machine learning in the past 20 years, many non-linear forecasting 
models have been introduced for wind power forecasting [15]. In literature it is found that most commonly 
KNN [16], SVM algorithms [17] and random forests [18] were used. The purpose of hybrid models [19] is to 
utilize each model for optimal predictive performance. Because the information contained in each prediction 
method is limited, the hybrid method maximizes the information available and integrates the individual 
model information, maximizing the benefits of multiple prediction methods and improving prediction 
accuracy [20]. Hybrid technology is a combination of different approaches, including a combination of short- 
term and medium-term models and a combination of physical and statistical approaches, and so on. Shi et al. 
[21] proposed two hybrid models for wind speed and power forecasting, ARIMA-SVM and ARIMA-ANN. 
They conducted a systematic and complete examination on two case studies for wind speed and wind power 
generation. The result shows that the proposed hybrid approaches do not always produce superior forecasting 
performance for all the forecasting time horizons. Zhao et al. [22] developed a hybrid wind prediction 
method consisting of an NWP model and an ANN model [23]. The NWP model is set up by combining a 
weather research and forecasting (WRF) system with global forecasting system (GFS) to predict weather 
parameters. 
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The scope of the work and the objective is to reduce the penalty by contributing in highest accuracy 
models from available resources and parameters. The main objective is to reduce large forecasting errors, 
which is responsible for most of the problems and costs with system operations. This paper also proposes and 
analyzes wind energy prediction models, to analyze the best evaluation parameter for statistical and machine 
learning (ML) models. The paper organized as; ‘proposed methodology’ section describes a detailed 
description of the wind power forecasting method and provides the generic architecture of the proposed 
model. The ‘experimental result’ section describes the results obtained through experimentations, and finally 
the ‘conclusion’ section provides relevant conclusions. 


2. PROPOSED METHODOLOGY 

Although physical models are widely used, there are some disadvantages as well. Consumers rely 
for weather services on weather forecasts service providers. The time scales available are always fixed, and 
forecasts are only available at specific times. Due to the chaotic nature of the atmosphere, providing good 
predictions using a physical model is a very challenging task. Therefore, for the short-term prediction, other 
approaches such as statistical learning are used. Literature survey presented by Galphade et al. [23], shows 
that statistical methods do not adapt to non-linear wind data, do not easily handle large amounts of data, and 
cannot predict long periods of time, so statistical methods cannot be the first recommended method of 
prediction. 

To overcome the gap in the literature a wind power forecasting model is proposed herewith, which 
uses statistical and machine learning models for finding the best model for multiperiod forecasting. This 
proposed architecture comprises five major phases, such as data collection, data pre-processing, feature 
selection, stationary test, model building and model performance evaluation as shown in Figure 1. 


Weather Data Windmill Data 


Data Acquisition 


Data Preprocessing 
Outlier detection 
* Noise elimination 
Missing value processing 
ata Preprocessing 


Correlation Matrix 


Keep Parameter 
Co-integration test 


Stationarity Test 


Model Building 
Model Variable Selection 
* Order Estimation 
Structure Selection 


Machine Learning 
Linear Regression 
* Polynomial Regression 
Decision Tree Regression 
Random Forest Regression 
Model Construction 


Statistical Methods 
e Naive Forecasting 
°. VAR 


Forecasting Error Evaluation 
RMSE 
* MAE 


Model Performance Evaluation 


Figure 1. Wind power forecasting architecture 
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Historical data from the Sotavento wind farm in Galicia, Spain (43.354377°N, 7.881213°W) is used 
in this paper. The corresponding weather data may be obtained from the world weather online forecast 
system (https://www.worldweatheronline.com/) using the latitude and longitude coordinates of the Sotavento 
wind farm. The proposed forecasting model is tested using five years of data from 2014 to 2019. The data has 
an hourly resolution, and the predicted lengths are short, medium, and long. The parameters and their unit 
used in this dataset are: dew point (C), cloud cover, humidity (Kg kg-1), pressure (K Pa), temperature (C), 
speed (m/s), direction (degree), and energy (kWh). 

In order to predict future values, the series should not contain any trend, seasonality and cyclic 
component. Differencing [24] is a widely used method to remove all these components. After removing 
trend, seasonality and cycles the series become a stationary series which have a stable mean and variance. 
Time series usually come from live observation or sensors which may contain noise and outliers [25]. Such 
noise and outliers are caused because of sensor error or equipment downtime interference. So before starting 
analysis one should clean the data so as to avoid the wrong conclusion. Noise removing can be handled by 
using traditional signal processing techniques such as digital filters or wavelet thresholding [26]. To filter 
outliers, k-nearest neighbor clustering [27] is widely used. Another issue is scaling, normalization is used to 
make sure all the data is in the appropriate scale. The dataset is plotted using a correlation matrix [28] as 
shown in Figure 2. The correlation matrix values ranges from -1 to +1, where -1 is a weakly related entity 
and +1 is strongly related. As per correlation matrix dew point and temperature are negatively related, so can 
be neglected. 
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Figure 2. Correlation matrix 


To confirm the stationarity of a time series data, a unit root test is done on the univariate time series 
and a Co-integration test is used on the multivariate time series. The dataset for experimentation is a 
multivariate time series dataset. The adfuller function is used, which returns a tuple of statistics such as Test 
statistics, p-value, number of lags used and number of observations used. If p-value is less than significant 
level (0.05), reject null hypothesis which states that there is presence of unit root. All the univariate time 
series has p-value less than 0.05, which means multivariate time series has stationarity. Different models 
were selected in this study. naive forecasting and vector AutoRegression are two statistical approaches, and 
machine learning models include multiple linear regression (MLR), polynomial linear regression (PLR), 
decision tree regression (DTR), and random forest regression (RFR). 


3. EXPERIMENTAL ANALYSIS 

Experimentation was done on Google Colab using standard libraries. Scikit-learn is the most helpful 
machine learning library in Python. Regression, classification, clustering, and dimensionality reduction are 
among the many useful tools for statistical modelling and machine learning. Statsmodels is a Python package 
that allows users to explore data, estimate statistical models, and perform statistical tests. The proposed 
model predicted results are compared with existing power generated data on all three time horizons; the same 
is shown in Figures 3 (a)-(e). The dataset contains 52584 records of 8 parameters. For medium term and long 
term prediction the same dataset is used by applying resampling. To resample data, down sampling by 
decreasing frequency of data from hourly to daily has been used. For medium term data resampled from 
hourly to daily and for long term hourly to weekly. 
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Figure 3. Wind power forecasting (a) Naive forecaste, (b) multiple linear regression, (c) polynomial linear 
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regression, (d) decision tree regression, and (e) Random forest regression 


In this experiment, the mean absolute error (MAE) and root-mean-square error (RMSE) were 
applied as evaluation indicators. The RMSE and MAE of various algorithms are illustrated in Table 2. As per 
MAE and RMSE value random forest has best prediction accuracy for the three time horizon. In addition, the 
multiple linear regression model is second to the random forest. This study obtained the lowest error index in 
training dataset of RF model, indicating that RF model has good training capabilities. These results indicated 
that the RF model performed significantly better than the other models. Many environmental variables have 
an impact on wind power predictions. In many cases, the relationship between output variables and 
environmental variables is complex and nonlinear. The MLR model can only explain the variation of output 
variables that are linear. As a result, when the MLR model is used to fit the connection between the 
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dependent variable and the environmental variables, the results are frequently unsatisfactory. The RF model 
does not require assumptions about the relationship between output variables and environmental variables, 
and it can handle non-linear correlations that the MLR model cannot. Furthermore, when compared to other 
estimating approaches, the RF model has the advantages of anti-overfitting, noise insensitivity, and unbiased 
error rate measurement, all of which contribute to improved estimation accuracy. 


Table 2. Performance evaluation using RMSE & MAE 


X Short term Medium term Long term 

Method Technique RMSE MAE RMSE MAE RMSE MAE 
Traditional Naive forecasting model 1.167 0.721 2.655 1.915 2.226 1.696 
statistic method Vector AutoRegression 3.229 3.083 3.374 3.248 2.894 2.577 
Machine Multiple linear regression 2.051 1.411 1.489 0.959 1.177 0.834 
learning Polynomial linear regression 2.109 1.437 1.714 1.025 4.244 2.595 
Decision tree regression 2.292 1.427 1.81 1.169 1.45 1.078 

Random forest regression 1.872 1.163 1.404 0.885 1.124 0.829 


As can be seen from Figures 4(a) and (b) statistical methods perform well for a short time interval, 
as the duration of prediction increases, error also increases. However in the case of machine learning 
algorithms, all perform well for long term forecasting. The findings demonstrate that overall performance of 
the proposed model using random forest is better on long term and medium term time span, however it is 
difficult to beat the naïve model for short term prediction. The plot graph in Figure 4 shows a time horizons 
vs error variation. It helps in determining the most suitable strategy for wind power forecasting over various 
time frames. The medium term accuracy on RMSE variation is 1.915 over the reference model RMSE 
variation 2.655. This believes that the intelligent proposed model is ~ 72% better on error variations. Also for 
long term accuracy on RMSE using proposed model is ~ 51% 


GNavie Forecasting 
Model 


BNavie Forecasting 
Model 


OVector 
AutoRrgression 


OVector 
AutoRrgression 


GMultiple Linear 
Regression 


OMultiple Linear 
Regression 


© Polynomial Linear 
Regression 


GPolynomial 


= a Linear Regression 
"e ODecision Tree 
: J i P Regression 


GRandom Forest 
Regression 


[S 

£ 

kel 

E 

E S] 
vo 

a Ž 

5 E 

s 

g < 

m s 

= g 

š = 

S 

S 

[a 


Decision Tree 
Regression 


GRandom Forest 


Short Term Medium Term Long Term Regression Short Term Medium Term Long Term 


Time Horizon 


(b) 


Figure 4. Performance evaluation (a) RMSE and (b) MAE 
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4. CONCLUSION 

Wind power forecasting has been treated as a challenging problem so far, due to the intermittent and 
uncertainties of data inputs and generation parameters. According to research, machine learning predictions 
are less expensive than NWP and are less vulnerable to faulty data or human mistakes. As the world gets 
more digitalized, machine learning may help to make processes more automated and error-free. The most 
frequent methods in wind power forecasting, according to the state-of-the-art, are neural network and RF. In 
this paper, the popular machine learning techniques applied to wind power forecasting have been empirically 
analysed. Most of these learning algorithms have been successful at approaching predictive analytics and 
outperform predictive problems. Statistical model, Naive forecasting was used as a reference model. 

The experimental results show that compared with statistical approach, the performance of machine 
learning algorithms is better in terms of accuracy. Statistical methods perform well for short-term forecasting, 
but fail to predict long term forecasting, whereas machine learning algorithms have shown overall ~ 72% in 
medium term and 51% improvement in long term forecasting. Random forest has achieved the most efficient 
results in terms of RMSE and MAE evaluation methodologies because RF ignores irrelevant input data and 
could predict outliers. Because RF ignores unnecessary input data and can forecast outliers, it has produced 
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the most efficient outcomes in terms of RMSE and MAE evaluation methodologies. The proposed intelligent 
architecture of multiple machine learning algorithms tested for large volume of datasets to predict short term, 
medium term and long term forecasting. Very short term forecasting was not included in this empirical study; 
however the work may be extended by combining statistical and deep learning approaches for predicting very 
short term forecasting. 
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